ACII情感声音爆发研讨会和竞争的重点是理解声乐爆发的多个情感维度:笑声,喘息,哭泣,尖叫声以及许多其他非语言声音,这是情感表达和人类交流的核心。今年的比赛包括四首曲目,使用1,702位扬声器的大规模和野外数据集提供59,299个发声。首先是A-VB高任务,要求竞争参与者使用十个类似的注释的情感表达强度,对情感进行新型模型进行多标签回归,包括:敬畏,恐惧和惊喜。第二个是A-VB-TWO任务,利用更传统的二维模型来进行情感,唤醒和价值。第三个是A-VB文化任务,要求参与者探索数据集的文化方面,培训本地国家依赖模型。最后,对于第四个任务,A-VB型,参与者应认识到声乐爆发的类型(例如,笑声,哭泣,咕unt)是8级分类。本文介绍了使用最先进的机器学习方法的四个轨道和基线系统。每条轨道的基线性能是通过使用端到端深度学习模型获得的,如下所示:对于A-VB-高,平均(超过10维)一致性相关系数(CCC)为0.5687 CCC为获得;对于A-VB-TWO,获得了0.5084的平均值(超过2维);对于A-VB培养物,从四个培养物中获得了0.4401的平均CCC;对于A-VB型,来自8类的基线未加权平均召回(UAR)为0.4172 UAR。
translated by 谷歌翻译
We propose AnyTOD, an end-to-end task-oriented dialog (TOD) system with zero-shot capability for unseen tasks. We view TOD as a program executed by a language model (LM), where program logic and ontology is provided by a designer in the form of a schema. To enable generalization onto unseen schemas and programs without prior training, AnyTOD adopts a neuro-symbolic approach. A neural LM keeps track of events that occur during a conversation, and a symbolic program implementing the dialog policy is executed to recommend next actions AnyTOD should take. This approach drastically reduces data annotation and model training requirements, addressing a long-standing challenge in TOD research: rapidly adapting a TOD system to unseen tasks and domains. We demonstrate state-of-the-art results on the STAR and ABCD benchmarks, as well as AnyTOD's strong zero-shot transfer capability in low-resource settings. In addition, we release STARv2, an updated version of the STAR dataset with richer data annotations, for benchmarking zero-shot end-to-end TOD models.
translated by 谷歌翻译
An outfit visualization method generates an image of a person wearing real garments from images of those garments. Current methods can produce images that look realistic and preserve garment identity, captured in details such as collar, cuffs, texture, hem, and sleeve length. However, no current method can both control how the garment is worn -- including tuck or untuck, opened or closed, high or low on the waist, etc.. -- and generate realistic images that accurately preserve the properties of the original garment. We describe an outfit visualization method that controls drape while preserving garment identity. Our system allows instance independent editing of garment drape, which means a user can construct an edit (e.g. tucking a shirt in a specific way) that can be applied to all shirts in a garment collection. Garment detail is preserved by relying on a warping procedure to place the garment on the body and a generator then supplies fine shading detail. To achieve instance independent control, we use control points with garment category-level semantics to guide the warp. The method produces state-of-the-art quality images, while allowing creative ways to style garments, including allowing tops to be tucked or untucked; jackets to be worn open or closed; skirts to be worn higher or lower on the waist; and so on. The method allows interactive control to correct errors in individual renderings too. Because the edits are instance independent, they can be applied to large pools of garments automatically and can be conditioned on garment metadata (e.g. all cropped jackets are worn closed or all bomber jackets are worn closed).
translated by 谷歌翻译
An approach to evolutionary ensemble learning for classification is proposed in which boosting is used to construct a stack of programs. Each application of boosting identifies a single champion and a residual dataset, i.e. the training records that thus far were not correctly classified. The next program is only trained against the residual, with the process iterating until some maximum ensemble size or no further residual remains. Training against a residual dataset actively reduces the cost of training. Deploying the ensemble as a stack also means that only one classifier might be necessary to make a prediction, so improving interpretability. Benchmarking studies are conducted to illustrate competitiveness with the prediction accuracy of current state-of-the-art evolutionary ensemble learning algorithms, while providing solutions that are orders of magnitude simpler. Further benchmarking with a high cardinality dataset indicates that the proposed method is also more accurate and efficient than XGBoost.
translated by 谷歌翻译
Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the Zooniverse.org citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion.
translated by 谷歌翻译
肾细胞癌(RCC)是一种常见的癌症,随着临床行为的变化。懒惰的RCC通常是低级的,没有坏死,可以在没有治疗的情况下监测。激进的RCC通常是高级的,如果未及时检测和治疗,可能会导致转移和死亡。虽然大多数肾脏癌在CT扫描中都检测到,但分级是基于侵入性活检或手术的组织学。确定对CT图像的侵略性在临床上很重要,因为它促进了风险分层和治疗计划。这项研究旨在使用机器学习方法来识别与病理学特征相关的放射学特征,以促进评估CT图像而不是组织学上的癌症侵略性。本文提出了一种新型的自动化方法,即按区域(Corrfabr)相关的特征聚集,用于通过利用放射学和相应的不对齐病理学图像之间的相关性来对透明细胞RCC进行分类。 CORRFABR由三个主要步骤组成:(1)特征聚集,其中从放射学和病理图像中提取区域级特征,(2)融合,放射学特征与病理特征相关的放射学特征在区域级别上学习,并且(3)在其中预测的地方学到的相关特征用于仅使用CT作为输入来区分侵略性和顽固的透明细胞RCC。因此,在训练过程中,Corrfabr从放射学和病理学图像中学习,但是在没有病理图像的情况下,Corrfabr将使用CORFABR将侵略性与顽固的透明细胞RCC区分开。 Corrfabr仅比放射学特征改善了分类性能,二进制分类F1分数从0.68(0.04)增加到0.73(0.03)。这证明了将病理疾病特征纳入CT图像上透明细胞RCC侵袭性的分类的潜力。
translated by 谷歌翻译
我们探索一种以数据为基础的学习方法来优化神经网络。我们构建神经网络检查点的数据集,并培训有关参数的生成模型。特别是,我们的模型是一个条件扩散变压器,鉴于初始输入参数向量以及提示的丢失,误差或返回,可以预测实现所需度量的参数更新的分布。在测试时,它可以在一个更新中优化具有看不见的参数的神经网络。我们发现我们的方法成功地生成了各种损失提示的参数。此外,它可以采样多模式参数解决方案,并具有有利的缩放属性。我们将方法应用于监督和强化学习中的不同神经网络体系结构和任务。
translated by 谷歌翻译
2型糖尿病(T2DM)的早期诊断对于及时的治疗干预措施和生活方式改变至关重要。随着医学成像数据在许多患者群体中变得更广泛可用,我们试图研究是否可以在表格学习分类器模型中利用图像衍生的表型数据来预测T2DM的发病率,而无需使用侵入性血液实验室测量。我们表明,使用图像衍生表型的神经网络和决策树模型都可以预测患者T2DM状态的召回评分高达87.6%。我们还提出了与“ Syntha1c编码器”相同的结构的新颖使用,这些结构能够输出模仿血液血红蛋白A1C经验实验室测量值的可解释值。最后,我们证明了T2DM风险预测模型对输入矢量成分中小扰动的敏感性可用于预测从以前看不见的患者人群中取样的协变量的性能。
translated by 谷歌翻译
美国的意识形态分裂在日常交流中变得越来越突出。因此,关于政治两极分化的许多研究,包括最近采取计算观点的许多努力。通过检测文本语料库中的政治偏见,可以尝试描述和辨别该文本的两极分性。从直觉上讲,命名的实体(即,用作名词的名词和短语)和文本中的标签经常带有有关政治观点的信息。例如,使用“支持选择”一词的人可能是自由的,而使用“亲生生命”一词的人可能是保守的。在本文中,我们试图揭示社交媒体文本数据中的政治极性,并通过将极性得分分配给实体和标签来量化这些极性。尽管这个想法很简单,但很难以可信赖的定量方式进行这种推论。关键挑战包括少数已知标签,连续的政治观点,以及在嵌入单词媒介中的极性得分和极性中性语义含义的保存。为了克服这些挑战,我们提出了极性感知的嵌入多任务学习(PEM)模型。该模型包括(1)自制的上下文保护任务,(2)基于注意力的推文级别的极性推导任务,以及(3)对抗性学习任务,可促进嵌入式的极性维度及其语义之间的独立性方面。我们的实验结果表明,我们的PEM模型可以成功学习极性感知的嵌入。我们检查了各种应用,从而证明了PEM模型的有效性。我们还讨论了我们的工作的重要局限性,并在将PEM模型应用于现实世界情景时的压力谨慎。
translated by 谷歌翻译
在这项工作中,我们研究了生成图像模型的性能和评估如何受到其培训数据集的种族组成的影响。通过检查和控制各种培训数据集中的种族分布,我们能够观察不同培训分布对生成的图像质量和生成图像的种族分布的影响。我们的结果表明,生成的图像的种族组成成功地保留了培训数据。但是,我们观察到截断是一种用于在推断过程中生成更高质量图像的技术,加剧了数据中的种族失衡。最后,在检查图像质量与种族之间的关系时,我们发现给定种族的最高可感知的视觉质量图像来自该种族代表性很好的分布,并且注释者始终偏爱白人的生成图像,而不是黑人。
translated by 谷歌翻译